Convenience packages

Elizabeth King
Kevin Middleton

Convenience packages

  • rethinking
    • Translates to stan code
    • You provide priors
  • brms
    • Translates to stan code
    • You can/should provide priors
  • rstanarm
    • Handles everything, very fast
    • You can/should provide priors

Inconvenience packages

  • rstan
  • cmdstanr

Inconvenience in one context is convenience in another.

  • Custom models, calculated values, etc.
  • Precompile and reuse models
    • Pass updated data
    • HPC cluster processing

A few words about cmdstanr

  • Not available from CRAN (yet)
  • Updates more frequently than rstan
    • New features
    • (Often) faster compilation and sampling
  • rethinking and brms can use as the backend sampler
  • Portable stan output files (R, Python, etc.)

rethinking

  • Associated Statistical Rethinking book
    • Many recipes for basic and advanced models
  • Automates writing stan code (rethinking::stancode())
    • Increasingly complicated model structures
  • Recommended for general use until you feel confident
    • Manual post-processing

brms

  • Uses R formula syntax
  • Very flexible, many vignettes
    • Multilevel, nonlinear, phylogenetic, missing data
  • Works well with plotting and processing tools (this lecture)
  • Works well with model comparison tools (next week)

rstanarm

  • Uses R formula syntax
  • Default uses centering, adjusts priors based on data
  • Predefined range of models:
    • stan_lm(), stan_glm(), stan_lmer(), stan_aov(), etc.
  • Wide range of vignettes
  • Some priors are complex to define

Example model

set.seed(746283)
D <- tibble(x = runif(50, 0, 10),
            y = 5 + x * -2 + rnorm(50, 0, 2))
ggplot(D, aes(x, y)) + geom_point()

What variables can have priors in brms?

Provided a model and data, brms::get_prior() gives you the list of defaults:

library(brms)
get_prior(y ~ x + 1, data = D)
                   prior     class coef group resp dpar nlpar lb ub
                  (flat)         b                                 
                  (flat)         b    x                            
 student_t(3, -3.4, 7.8) Intercept                                 
    student_t(3, 0, 7.8)     sigma                             0   
       source
      default
 (vectorized)
      default
      default

Example model

priors <- 
  prior(normal(0, 10), class = "Intercept") +
  prior(normal(0, 5), class = "b", coef = "x")
fm <- brm(y ~ x + 1,
          data = D,
          prior = priors,
          seed = 987655,
          iter = 5000,
          backend = "cmdstanr")

Priors

prior_summary(fm)
                prior     class coef group resp dpar nlpar lb ub  source
               (flat)         b                                  default
         normal(0, 5)         b    x                                user
        normal(0, 10) Intercept                                     user
 student_t(3, 0, 7.8)     sigma                             0    default

Summary

summary(fm)
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: y ~ x + 1 
   Data: D (Number of observations: 50) 
  Draws: 4 chains, each with iter = 5000; warmup = 2500; thin = 1;
         total post-warmup draws = 10000

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     4.89      0.60     3.71     6.08 1.00     8731     6637
x            -1.93      0.11    -2.14    -1.72 1.00     8363     6502

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     2.18      0.23     1.80     2.69 1.00     8622     6986

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

posterior

  • Most plotting and summary actions need the samples (“draws”) from the posterior
  • posterior extracts these in useful formats
  • Retain the chains or not?
    • Yes for chain diagnostic plots
    • Not necessary for summaries

Extracting posteriors

library(posterior)

# Combined
post_combined_mat <- as_draws_matrix(fm)
post_combined_df <- as_draws_df(fm)

# With chains as 3rd dimension
post_chains <- as_draws_array(fm)

Posteriors

str(post_combined_df)
draws_df [10,000 × 8] (S3: draws_df/draws/tbl_df/tbl/data.frame)
 $ b_Intercept: num [1:10000] 5.08 4.64 5.19 4.64 4.26 ...
 $ b_x        : num [1:10000] -1.94 -1.96 -1.94 -1.92 -1.87 ...
 $ sigma      : num [1:10000] 2.34 1.99 2.07 2.23 2.22 ...
 $ lprior     : num [1:10000] -8.33 -8.34 -8.32 -8.34 -8.35 ...
 $ lp__       : num [1:10000] -116 -116 -116 -116 -116 ...
 $ .chain     : int [1:10000] 1 1 1 1 1 1 1 1 1 1 ...
 $ .iteration : int [1:10000] 1 2 3 4 5 6 7 8 9 10 ...
 $ .draw      : int [1:10000] 1 2 3 4 5 6 7 8 9 10 ...

Drop unused columns

  • Or use pars = to select variables for plotting
post <- post_combined_df |> 
  select(-lprior, -`lp__`)
str(post)
draws_df [10,000 × 6] (S3: draws_df/draws/tbl_df/tbl/data.frame)
 $ b_Intercept: num [1:10000] 5.08 4.64 5.19 4.64 4.26 ...
 $ b_x        : num [1:10000] -1.94 -1.96 -1.94 -1.92 -1.87 ...
 $ sigma      : num [1:10000] 2.34 1.99 2.07 2.23 2.22 ...
 $ .chain     : int [1:10000] 1 1 1 1 1 1 1 1 1 1 ...
 $ .iteration : int [1:10000] 1 2 3 4 5 6 7 8 9 10 ...
 $ .draw      : int [1:10000] 1 2 3 4 5 6 7 8 9 10 ...

bayesplot

Chain diagnostics

  • Trace plots: mcmc_trace()
  • Rank histograms: mcmc_rank_overlay()
  • Autocorrelation: mcmc_acf()
  • Combination plots: mcmc_combo()

mcmc_trace()

mcmc_trace(post)

mcmc_rank_overlay()

mcmc_rank_overlay(post)

mcmc_combo()

mcmc_combo(post)

Prior and posterior predictive distributions

  • Sample from the prior
    • See if the range of values is plausible
    • Helpful for multilevel models
  • Plot the posterior
    • See if it makes sense given the data

Sampling from the prior

  • sample_prior = "only" ignores the data and samples the prior only.
  • Need to provide data nonetheless
PP <- brm(y ~ x + 1,
          data = D,
          prior = priors,
          sample_prior = "only",
          seed = 4628346,
          iter = 5000,
          backend = "cmdstanr")

Prior/posterior predictive distributions

  • Same approach except for using the data
  • pp_check(): access many different checks for brm() models from bayesplot
    • Density
    • Histogram
    • Mean/median/min/max
    • Ribbon
    • “Residuals”

Prior predictive distribution

Density overlay of 50 draws from the prior (PP)

pp_check(PP, type = "dens_overlay", ndraws = 50)

Prior predictive distribution

Prior predictive distribution

Median of \(y\) plotted over a histogram of prior predicted values for \(y\)

  • Is the prior preduction range reasonable for \(y\)?
pp_check(PP, type = "stat", stat = "median")

Prior predictive distribution

Posterior predictive check

Density overlay of 50 draws from the posterior (fm)

  • Does the posterior adequately model the data?
pp_check(fm, type = "dens_overlay", ndraws = 50)

Posterior predictive check

Posterior predictive check

Posterior predicted intervals and observed vs. \(x\)

  • Most points should fall within in the bounds
pp_check(fm, type = "ribbon", x = "x")

Posterior predictive check

Posterior predictive check

“Residual” mean predicted error vs. \(x\)

  • Normally distributed residuals around the mean line
  • Looking for no structure in the residuals
pp_check(fm, type = "error_scatter_avg_vs_x", x = "x", size = 2)

Posterior predictive check

Visual posterior summaries

  • Density: mcmc_dens(), mcmc_dens_overlay(), mcmc_areas()
  • Intervals: mcmc_intervals()

mcmc_dens()

mcmc_dens(post)

mcmc_dens_overlay()

mcmc_dens_overlay(post)

mcmc_areas()

mcmc_areas(post)

tidybayes: tidy data + ggplot

  • Visualizing and summarizing posteriors
  • Adding predicted values
  • Challenging to learn but powerful when you do